Late multimodal fusion for image and audio music transcription

نویسندگان

چکیده

Music transcription, which deals with the conversion of music sources into a structured digital format, is key problem for Information Retrieval (MIR). When addressing this challenge in computational terms, MIR community follows two lines research: documents, case Optical Recognition (OMR), or audio recordings, Automatic Transcription (AMT). The different nature aforementioned input data has conditioned these fields to develop modality-specific frameworks. However, their recent definition terms sequence labeling tasks leads common output representation, enables research on combined paradigm. In respect, multimodal image and transcription comprises effectively combining information conveyed by modalities. work, we explore question at late-fusion level: study four combination approaches order merge, first time, hypotheses regarding end-to-end OMR AMT systems lattice-based search space. results obtained series performance scenarios–in corresponding single-modality models yield error rates–showed interesting benefits approaches. addition, strategies considered significantly improve unimodal standard recognition

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multimodal medical image fusion based on Yager’s intuitionistic fuzzy sets

The objective of image fusion for medical images is to combine multiple images obtained from various sources into a single image suitable for better diagnosis. Most of the state-of-the-art image fusing technique is based on nonfuzzy sets, and the fused image so obtained lags with complementary information. Intuitionistic fuzzy sets (IFS) are determined to be more suitable for civilian, and medi...

متن کامل

Automatic Music Transcription and Audio Source Separation

2 In this article, we give an overview of a range of approaches to the analysis and separation of musical audio. In particular, we consider the problems of automatic music transcription and audio source separation, which are of particular interest to our group. Monophonic music transcription, where a single note is present at one time, can be tackled using an autocorrelation-based method. For p...

متن کامل

Multimodal Feature Extraction and Fusion for Audio-Visual Speech Recognition

xiii

متن کامل

Automatic Music Transcription using Audio-Visual Fusion for Violin Practice in Home Environment

Violin practice in a home environment, where there is often no teacher available, can benefit from automatic music transcription to provide feedback to the student. This paper describes a high performance violin transcription system with three main contributions. First, as onset detection is an important but challenging task for automatic transcription of pitched non-percussive music, such as f...

متن کامل

Lyrics-Based Audio Retrieval and Multimodal Navigation in Music Collections

Modern digital music libraries contain textual, visual, and audio data describing music on various semantic levels. Exploiting the availability of different semantically interrelated representations for a piece of music, this paper presents a query-by-lyrics retrieval system that facilitates multimodal navigation in CD audio collections. In particular, we introduce an automated method to time a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Expert Systems With Applications

سال: 2023

ISSN: ['1873-6793', '0957-4174']

DOI: https://doi.org/10.1016/j.eswa.2022.119491